Extraction of long k-mers using spaced seeds
نویسندگان
چکیده
The extraction of k-mers from sequencing reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These tend to be more accurate when the used are unique analyzed DNA, and thus use longer preferred. When read lengths short technologies increase, error rate will become determining factor for largest possible value k. Here we propose LoMeX which uses spaced seeds extract long accurately even presence errors. Our experiments show that can current Illumina with a higher recall than standard k-mer counting tool. Furthermore, our simulated data length further increases, performance counters declines, whereas still extracts successfully.
منابع مشابه
Clustering metagenomic reads using spaced k-mers
With the emergence of next-generation sequencing technologies, the classification of short reads in a metagenomic sample has become an important yet difficult task. Several tools attempt to tackle this problem with each having a strong point in certain situations. Herein, a novel method is proposed that has its strong point in processing short reads. It is based on two new concepts: utilizing m...
متن کاملAlignment-free sequence comparison with spaced k-mers
Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments. In particular, they are much faster than alignment-based methods. Most alignmentfree approaches work by analyzing the k-mer composition of sequences. In this paper, we propose to use ‘spaced ...
متن کاملSpaced Seeds Design Using Perfect Rulers
We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of ...
متن کاملSpaced seeds improve k-mer-based metagenomic classification
MOTIVATION Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by next-generation sequencing technologies. To cope with massive data involved in modern metagenomic projects, recent tools rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. RESULTS Within this general framework...
متن کاملVector seeds: An extension to spaced seeds
We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core regions of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM Transactions on Computational Biology and Bioinformatics
سال: 2021
ISSN: ['2374-0043', '1557-9964', '1545-5963']
DOI: https://doi.org/10.1109/tcbb.2021.3113131